Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines
نویسندگان
چکیده
In the Generalized Minimal Residual Method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of non-blocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is only used one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebyshev bases instead. We model the performance on massively parallel machines with an analytical model.
منابع مشابه
The Communication-Hiding Conjugate Gradient Method with Deep Pipelines
Krylov subspace methods are among the most efficient present-day solvers for large scale linear algebra problems. Nevertheless, classic Krylov subspace method algorithms do not scale well on massively parallel hardware due to the synchronization bottlenecks induced by the computation of dot products throughout the algorithms. Communication-hiding pipelined Krylov subspace methods offer increase...
متن کاملHiding global synchronization latency in the preconditioned Conjugate Gradient algorithm
Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communi...
متن کاملA Global Gmres/multi-grid Scheme for an Adaptive Cartesian/quad Grid Flow Solver on Distributed Memory Machines
A global multi-grid/GMRES solution methodology on distributed memory machines is successfully developed in this study. To preserve the effectiveness of the multigrid scheme, the grid partitioning is based on the communication graph of the coarsest grid, so that all levels of the multi-grids are located in the same zone (processor). Each node of the graph is weighted with the total number of the...
متن کاملThe communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems
A High Performance Computing alternative to traditional Krylov subspace methods, pipelined Krylov subspace solvers offer better scalability in the strong scaling limit compared to standard Krylov subspace methods for large and sparse linear systems. The typical synchronization bottleneck is mitigated by overlapping time-consuming global communication phases with local computations in the algori...
متن کاملSolving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs
Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM J. Scientific Computing
دوره 35 شماره
صفحات -
تاریخ انتشار 2013